首页> 外文OA文献 >Accelerating Deep Neural Network Training with Inconsistent Stochastic Gradient Descent

【2h】

Accelerating Deep Neural Network Training with Inconsistent Stochastic Gradient Descent

机译：用不一致随机数加速深度神经网络训练梯度下降

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

SGD is the widely adopted method to train CNN. Conceptually it approximatesthe population with a randomly sampled batch; then it evenly trains batches byconducting a gradient update on every batch in an epoch. In this paper, wedemonstrate Sampling Bias, Intrinsic Image Difference and Fixed Cycle PseudoRandom Sampling differentiate batches in training, which then affect learningspeeds on them. Because of this, the unbiased treatment of batches involved inSGD creates improper load balancing. To address this issue, we presentInconsistent Stochastic Gradient Descent (ISGD) to dynamically vary trainingeffort according to learning statuses on batches. Specifically ISGD leveragestechniques in Statistical Process Control to identify a undertrained batch.Once a batch is undertrained, ISGD solves a new subproblem, a chasing logicplus a conservative constraint, to accelerate the training on the batch whileavoid drastic parameter changes. Extensive experiments on a variety of datasetsdemonstrate ISGD converges faster than SGD. In training AlexNet, ISGD is21.05\% faster than SGD to reach 56\% top1 accuracy under the exactly sameexperiment setup. We also extend ISGD to work on multiGPU or heterogeneousdistributed system based on data parallelism, enabling the batch size to be thekey to scalability. Then we present the study of ISGD batch size to thelearning rate, parallelism, synchronization cost, system saturation andscalability. We conclude the optimal ISGD batch size is machine dependent.Various experiments on a multiGPU system validate our claim. In particular,ISGD trains AlexNet to 56.3% top1 and 80.1% top5 accuracy in 11.5 hours with 4NVIDIA TITAN X at the batch size of 1536.

机译：SGD是训练CNN的广泛采用的方法。从概念上讲，它以随机抽样的批次来近似人口。然后通过在每个时期对每个批次进行梯度更新来均匀地训练批次。在本文中，我们演示了采样偏差，内在图像差异和固定周期伪随机采样在训练中区分批次，从而影响它们的学习速度。因此，SGD中涉及的批次的无偏对待会导致不正确的负载平衡。为了解决这个问题，我们提出了不一致的随机梯度下降（ISGD），以根据批次的学习状态动态地改变训练强度。 ISGD特别利用统计过程控制中的技术来识别训练不足的批次。一旦训练不足，ISGD会解决新的子问题，追赶逻辑加上保守的约束条件，从而在避免剧烈参数变化的情况下加快对批次的训练。在各种数据集上进行的大量实验表明，ISGD的收敛速度比SGD快。在训练AlexNet时，在完全相同的实验设置下，ISGD的速度比SGD快21.05％，达到top1精度的56％。我们还扩展了ISGD，使其可以在基于数据并行性的multiGPU或异构分布式系统上工作，从而使批处理大小成为可伸缩性的关键。然后从学习率，并行度，同步成本，系统饱和度和可扩展性等方面对ISGD批量大小进行了研究。我们得出结论，最佳ISGD批次大小取决于机器。在multiGPU系统上进行的各种实验证明了我们的主张。特别是，ISGD使用4NVIDIA TITAN X以1536的批处理量在11.5小时内将AlexNet的准确度训练为56.3％top1和80.1％top5。

著录项

作者
Wang, Linnan; Yang, Yi; Min, Martin Renqiang; Chakradhar, Srimat;
展开▼
作者单位

展开▼
年度 2017
总页数
原文格式 PDF
正文语种
中图分类

相似文献

外文文献
中文文献
专利

1. Accelerating deep neural network training with inconsistent stochastic gradient descent [J] . Wang Linnan, Yang Yi, Min Renqiang, Neural Networks: The Official Journal of the International Neural Network Society . 2017,第期

机译：加速深度神经网络训练，随机梯度下降不一致
2. Non-convergence of stochastic gradient descent in the training of deep neural networks [J] . Cheridito Patrick, Jentzen Arnulf, Rossmannek Florian Journal of complexity . 2021,第Juna期

机译：深神经网络训练中随机梯度下降的非融合
3. Stochastic Gradient Descent–Whale Optimization Algorithm-Based Deep Convolutional Neural Network To Crowd Emotion Understanding [J] . Avinash Ratre The Computer journal . 2020,第2CD期

机译：基于随机梯度下降-鲸鱼优化算法的深度卷积神经网络对人群情感理解
4. Improving training time of deep neural networkwith asynchronous averaged stochastic gradient descent [C] . You Zhao, Xu Bo International Symposium on Chinese Spoken Language Processing . 2014

机译：异步平均随机梯度下降法缩短了深度神经网络的训练时间
5. An Investigation of Stochastic Gradient Descent Dynamics of Neural Networks [D] . Luo, Victor. 2021

机译：神经网络随机梯度下降动力学研究
6. Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks [O] . Shrihari Vasudevan 2020

机译：基于互动信息的学习速率衰减用于深神经网络的随机梯度血统训练
7. Non-convergence of stochastic gradient descent in the training of deep neural networks [O] . Patrick Cheridito, Arnulf Jentzen, Florian Rossmannek 2021

机译：深神经网络训练中随机梯度下降的非融合

Accelerating Deep Neural Network Training with Inconsistent Stochastic Gradient Descent

摘要

著录项

相似文献

相关主题

期刊订阅